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A stochastic process’s statistical complexity stands out as a fundamental property: the minimum 
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Discovering and describing correlation and pattern are 
critical to progress in the physical sciences. Observing 
the weather in California last Summer we find a long 
series of sunny days interrupted only rarely by rain—a 
pattern now all too familiar to residents. Analogously, 
a one-dimensional spin system in a magnetic field might 
have most of its spins “up” with just a few “down” — 
defects determined by the details of spin coupling and 
thermal fluctuations. Though nominally the same pat¬ 
tern, the domains of these systems span the macroscopic 
to the microscopic, the multi-layer to the pure. Despite 
the gap, can we meaningfully compare these two pat¬ 
terns? 

To exist on an equal descriptive footing, they must 
each be abstracted from their physical embodiment by, 
for example, expressing their generating mechanisms via 
minimal probabilistic encodings. Measures of unpre¬ 
dictability, memory, and structure then naturally arise 
as information-theoretic properties of these encodings. 
Indeed, the fundamental interpretation of (Shannon) in¬ 
formation is as a rate of encoding such sequences. This 
recasts the informational properties as answers to dis¬ 
tinct communication problems. For instance, a process’ 
structure becomes the problem of two observers, Alice 
and Bob, synchronizing their predictions of the process. 

However, what if the communication between Alice 
and Bob is not classical? What if Alice instead sends 
qubits to Bob—that is, they synchronize over a quantum 
channel? Does this change the communication require¬ 
ments? More generally, does quantum communication 
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enhance our understanding of what “pattern” is in the 
first place? What if the original process is itself quan¬ 
tum? More practically, is the quantum encoding more 
compact? 

A provocative answer to the last question appeared 
recently [1-3] suggesting that a quantum representation 
can compress a stochastic process beyond its known clas¬ 
sical limits [4] . In the following, we introduce a new con¬ 
struction for quantum channels that improves and broad¬ 
ens that result to any memoryful stochastic process, is 
highly computationally efficient, and identifies optimal 
quantum compression. Importantly, we draw out the 
connection between quantum compressibility and process 
cryptic order—a purely classical property that was only 
recently discovered [5]. Finally, we discuss the subtle way 
in which the quantum framing of pattern and structure 
differs from the classical. 

Synchronizing Classical Processes To frame these 
questions precisely, we focus on patterns generated by 
discrete-valued, discrete-time stationary stochastic pro¬ 
cesses. There is a broad literature that addresses such 
emergent patterns [6-8]. In particular, computational 
mechanics is a well developed theory of pattern whose 
primary construct—the e-machine —is a process’s mini¬ 
mal, unifilar predictor [4]. The e-machine’s causal states 
a G S are defined by the equivalence relation that groups 
all histories 1 F = X-oo-.o that lead to the same prediction 
of the future X = Xo-.oo- 

^ ^ ^ Pr(^l^) = Pr(^l^') . (1) 

Helpfully, a process’ e-machine allows one to directly 
calculate its measures of unpredictability, memory, and 
structure. 

For example, the most basic question about unpre- 
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dictability is, how much uncertainty about the next fu¬ 
ture observation remains given complete knowledge of 
the infinite past? This is measured by the well-known 
Shannon entropy rate [9-12]: 

= hm RiXL\Xo,L) , 

L —¥00 

where Xq-l denotes the block of symbols of length L, and 
H = —J2Pi^^SPi is til® Shannon entropy (in bits using 
log base 2) of the probability distribution {pi} [13]. A 
process’s e-machine allows us to directly calculate this in 
closed form as the state-averaged branching uncertainty: 

^ ^ H(Aio|tSo — er^) , 

<yi £S 

where tt denotes the stationary distribution over the 
causal states. This form is possible due to e-machine’s 
unifilarity: in each state cr, each symbol x leads to at 
most one successor state a'. 

One can ask the complementary question, given knowl¬ 
edge of the infinite past, how much can we reduce our un¬ 
certainty about the future? This quantity is the mutual 
information between the past and future and is known 
the excess entropy [9, and citations therein]: 

E = I[A_oo:0 : Aioioo] • 

It is the total amount of future information predictable 
from the past. Using the e-machine we can directly cal¬ 
culate it also: 

E = l[S+ : S-] , 

where 5“*" and S~ are the forward (predictive) and re¬ 
verse (retrodictive) causal states, respectively [5]. This 
suggests we think of any process as channel that com¬ 
municates the past to the future through the present. In 
this view E is the information transmission rate through 
the present “channel”. The excess entropy has been ap¬ 
plied to capture the total predictable information in such 
diverse systems as Ising spin models [14], diffusion in non¬ 
linear potentials [15], neural spike trains [16-18], and hu¬ 
man language [19]. 

What memory is necessary to implement predicting E 
bits of the future given the past? Said differently, what 
resources are required to instantiate this putative chan¬ 
nel? Most basically, this is simply the historical infor¬ 
mation the process remembers and stores in the present. 
The minimum necessary such information is that stored 
in the causal states, the statistical complexity [4]: 

= H(5) = - ^ TTi log TTi . 

Importantly, it is lower-bounded by the excess entropy: 

E<C^ , 


where tt^ denotes the stationary distribution over causal 
states. 

What do these quantities tell us? Perhaps the most 
surprising observation is that there is a large class of 
cryptic processes for which E ^ [5]. The structural 

mechanism behind this difference is characterized by the 
cryptic order: the minimum k for which H[iSfejAo:oo] = 0. 
A related and more familiar property is the Markov or¬ 
der: the smallest R for which H[5ijjAo:K] = 0. Markov 
order reflects a process’s historical dependence. These 
orders are independent apart from the fact that k < R 
[20, 21]. It is worth pointing out that the equality E = Cn 
is obtained exactly for cryptic order fc = 0 and, further¬ 
more, that this corresponds with counifilarity —for each 
state cr' and each symbol x, there is at most one prior 
state a that leads to a' on a transition generating x [21]. 

These properties play a key role in the following com¬ 
munication scenario where we have a given process’s 
e-machine in hand. Alice and Bob each have a copy. 
Since she has been following the process for some time, 
using her e-machine Alice knows that the process is cur¬ 
rently in state aj, say. From this knowledge, she can 
use her e-machine to make the optimal probabilistic pre¬ 
diction Pr(Ao:L[cTj) about the process’ future (and do so 
over arbitrarily long horizons L). While Bob is able to 
produce all such predictions from each of his e-machine’s 
states, he does not know which particular state is cur¬ 
rently relevant to Alice. We say that Bob and Alice are 
unsynchronized. 

To communicate the relevant state to Bob, Alice must 
send at least bits of information. More precisely, 
to communicate this information for an ensemble (size 
N —> oo) of e-machines, she may, by the Shannon noise¬ 
less coding theorem [13], send NC^ bits. Under this in¬ 
terpretation, is a fundamental measure of a process’s 
structure in that it characterizes not only the correlation 
between past and future, but also the mechanism of pre¬ 
diction. In the scenario with Alice and Bob, is seen 
as the communication cost to synchronize. We can also 
imagine Alice using this channel to communicate with 
her future self. In this light, is understood as a fun¬ 
damental measure of a process’ internal memory. 


RESULTS 

A. Quantum Synchronization 

What if Alice can send qubits to Bob? Consider a com¬ 
munication protocol in which Alice encodes the causal 
state in a quantum state that is sent to Bob. Bob then 
extracts the information through measurement of this 
quantum state. Their communication is implemented 
via a quantum object—the q-machine —that simulates 
the original stochastic process. It sports a single param¬ 
eter that sets the horizon-length L of future words incor¬ 
porated in the quantum-state superpositions it employs. 
We monitor the q-machine protocol’s efficacy by compar- 
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ing the quantum-state information transmission rate to 
the classical causal-state rate (C^). 

The q-machine M{L) consists of a set {|?7fe(T))} of pure 
signal states that are in one-to-one correspondence with 
the classical causal states ak & S. Each signal state 
|%(L)) encodes the set of length-L words that may fol¬ 
low (Tfc, as well as each corresponding conditional proba¬ 
bility used for prediction from ak- Fixing L, we construct 
quantum states of the form: 

h{L)) = \/Pr(u;'^,crfc|crj) \w^) \ak) , 

W^G\A\^ 

where denotes a length-L word and Pi{w^,ak\<Tj) = 
Pr(Xo:L = w^,Sl = crfc|5o = <7j). Due to e-machine 
unifilarity, a word following a causal state aj leads to 
only one subsequent causal state. Thus, Pr(i(;'^, (Jk\crj) = 
Pr{w^\aj). The resulting Hilbert space is the product 
'Hw 'Ha- Factor space Ha is of size |5|, the number of 
classical causal states, with basis elements |crfe). Factor 
space Hw is of size |.4|^, the number of length-L words, 
with basis elements \w^) = |a:o) • ■ ■ \xl-i). 

Note that the L = 1 q-machine M(l) is equivalent 
to the construction introduced in Ref. [1]. Additionally, 
insight about the q-machine can be gained through its 
connection with the classical concatenation machine de¬ 
fined in Ref. [22]; the q-machine M{L) is equivalent to 
the q-machine M{\) derived from the Lth concatenation 
machine. 

Having specified the Hilbert state space, we now de¬ 
scribe how the q-machine produces symbol sequences. 
Given one of the pure quantum signal states, we perform 
a projective measurement in the Hw basis. This results 
in a symbol string = xq, ... which we take 

as the next L symbols in the generated process. Since 
the e-machine is unifilar, the quantum conditional state 
must be in some basis state jcjfc) of Ha- Subsequent mea¬ 
surement in this basis then indicates the corresponding 
classical causal state with no uncertainty. 

Observe that the probability of a word given quan¬ 
tum state \rik) is equal to the probability of that word 
given the analogous classical state ak- Also, the classical 
knowledge of the subsequent corresponding causal state 
can be used to prepare a subsequent quantum state for 
continued symbol generation. Thus, the q-machine gen¬ 
erates the desired stochastic process and is, in this sense, 
equivalent to the classical e-machine. 

Focus now on the q-machine’s initial quantum state: 

= Y^Pi ML)) {ML)\ - 

i 

We see this mixed quantum state is composed of pure 
signal states combined according to the probabilities of 
each being prepared by Alice (or being realized by the 
original process that she observes). These are simply the 
probabilities of each corresponding classical causal state, 
which we take to be the stationary distribution: pi = iTi. 


In short, quantum state p(L) is what Alice must transmit 
to Bob for him to successfully synchronize. Later, we 
revisit this scenario to discuss the tradeoffs associated 
with the q-machine representation. 

If Alice sends a large number N of these states, she 
may, according to the quantum noiseless coding theorem 
[23], compress this message into NS{p{L)) qubits, where 
S is the von Neumann entropy S{p) = tr(/9log(p)). Due 
to its parallel with C^, and for convenience, we define the 
function: 

C,(L) = SipiL)) . 

Recall that, classically, Alice must send NC^j, bits. To the 
extent that NCq{L) is smaller, the quantum protocol will 
be more efficient. 


B. Example Processes: ML) 

Let’s now draw out specific consequences of using the 
q-machine. We explore protocol efficiency by calculating 
Cq{L) for several example processes, each chosen to illus¬ 
trate distinct properties: q-machine affords a quantum 
advantage, further compression can be found at longer 
horizons L, and the compression rate is minimized at the 
horizon length k —the cryptic order of the classical pro¬ 
cess [21]. 

For each example, we examine a process family by 
sweeping one transition probability parameter, illustrat¬ 
ing Cq{L) and its relation to classical bounds and 
E. Additionally, we highlight a single representative pro¬ 
cess at one generic transition probability. Following these 
examples, we turn to discuss more general properties of 
q-machine compression that apply quite broadly and how 
the results alter our notion of quantum structural com¬ 
plexity. 


1. Biased Coins Process 

The Biased Coins Process provides a first, simple case 
that realizes a nontrivial quantum state entropy [1]. 
There are two biased coins, named A and B. The first 
generates 0 with probability p; the second, 0 with proba¬ 
bility 1—p. A coin is picked and flipped, generating some 
output 0 or 1. With probability 1 — p the other coin is 
used next. Its two causal-state e-machine is shown in 
Fig. l(top). 

Consider p « 1/2. The generated sequence is close to 
that of a fair coin. And, starting with coin A or H makes 
little difference to the future. There is little to predict 
about future sequences. This intuition is quantified by 
the predictable information E « 0, when p is near 1/2. 
See Fig. l(left). 

In contrast, since the causal states have equal prob¬ 
ability, = 1 bit independent of parameter p. (All 
informations are quoted in log base 2.) The gap between 
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FIG. 1. Biased Coins Process: (top) e-Machine. Edges are 
conditional probabilities. For example, self-loop on state A 
p|0 indicates Pr(0|j4) = p. (left) Statistical complexity C/j, 
quantum state entropy Cq{L), and excess entropy E as a func¬ 
tion of A’s self-loop probability p € [0,1]. Cq(l) (dark blue) 
lies between and E (bits), except for extreme parameters 
and the center (p = 1/2). (right) For p = 0.666, Cq(L) de¬ 
creases from L = 0 to L = 1 and is then constant; the process 
is maximally compressed at L = 1, its cryptic order A; = 1. 
This yields substantial compression: Cq(l) C^. 




FIG. 2. 4-3 Golden Mean Process: (top) The e-machine, (left) 
Statistical complexity quantum state entropy Cq{L), and 
excess entropy E as a function of A’s self-loop probability 
p € [0,1]. Cq{L) is calculated and plotted (light to dark 
blue) up to L = 5. (right) For p = 0.505, Cq{L) decreases 
monotonically until L = 3—the process’ cryptic order. The 
additional compression is substantial: Cq{3) Cq{l). 


and E presents an opportunity for large quantum im¬ 
provement. This is because there is always some, albeit 
very little, predictive advantage to remembering whether 
the last symbol was 0 or 1. Retaining this advantage, 
however small, requires the use of an entire (classical) 
bit. It is only at the exact value p = 1/2 where the two 
causal states merge, this advantage disappears, and the 
process becomes memoryless (HD). This is reflected in 
the discontinuity of as p —>■ 1/2, which is sometimes 
misinterpreted as a deficiency of C^. Contrariwise, this 
feature follows naturally from the equivalence relation 
and is a signature of symmetry. 

Now, let’s consider these complexities in the quan¬ 
tum setting where we monitor communication costs us¬ 
ing Cq(L). To understand its behavior, we first write 
down the q-machine’s states. For L = 0, we have the 
trivial \ri/^) = |A) and \r]g) = \B). For L = 1, we have 
\ Va) = yi -P|0) \^)+Vp\^) \B) and \r]]^) = ^\0) |A)-b 
|1) \B). The von Neumann entropy of the former 
is simply the Shannon information of the signal state 
distribution; that is, (7^(0) = C^. In the latter, however, 
the two quantum states have a nonzero overlap (inner 
product). This implies that the von Neumann entropy is 
smaller than the Shannon entropy Cq{l) < Cf_i = Cq{0). 
(See Ref. [24] Thm. 11.10.) Also, making use of the 
Holevo bound, we see that E < Cq{l) [1, 25]. These 
bounds are maintained for all L: E < Cq{L) < C^. This 
follows by considering the q-machine M(l) of the Lth 
classical concatenation. 

(Note that for p G {0,1/2,1} these quantities are all 
equal and equal to zero. This comes from the simplifica¬ 


tion of process topology caused by state merging dictated 
by the predictive equivalence relation, Eq. (1).) 

How do costs change with sequence length L? To see 
this Fig. 1 (right) expands the left view, but for a single 
value of p. As expected, Cq(L) decreases from L = 0 to 
L = 1. However, it then remains constant for all L > 1. 
There is no additional quantum state-compression af¬ 
forded by expanding the q-machine to use longer hori¬ 
zons. 

The Biased Coins Process has been analyzed earlier 
using a construction equivalent to an L = 1 q-machine 
[1], similarly finding that that the number of required 
qubits falls between E and C^. The explanation there 
for this compression (C'g(l) < C^) was lack of counifilar- 
ity in the process’ e-machine. More specifically, Ref. [1] 
showed that E = Cq = if and only if the e-machine 
is counifilar, and E < C, < otherwise. The Biased 
Coins Process is easily seen to be noncounifilar and so 
the inequality follows. 

This previous analysis happens to be sufficient for the 
Biased Coins Process, since Cq{L) does not decrease be¬ 
yond L = 1. Unfortunately, only this single, two-state 
process was analyzed when, in fact, the space of processes 
is replete with richly structured behaviors [26] . With this 
in mind and to show the power of the q-machine, we step 
into deeper water to consider a 7-state process that is 
almost periodic with a random phase-slip. 






































5 


2. R-k Golden Mean Process 

The R-k Golden Mean Process is a useful generaliza¬ 
tion of the Markov order-1 Golden Mean Process that 
allows for the independent specification of Markov order 
R and cryptic order k [20, 21], Figure 2(top) illustrates 
its e-machine. We take R = 4 and k = 3. 

The calculations in Fig. 2(left) show again that Cq{L) 
generically lies between E and across this family of 
processes. In contrast with the previous example, Cq{L) 
continues to decrease beyond L = 1. Figure 2 (right) 
illustrates that the successive q-machines continue to re¬ 
duce the von Neumann entropy: > Cq{l) > Cq(2) > 

Cq (3). However, there is no further improvement beyond 
a future-depth of L = 3, the cryptic order: Cq{3) = 
Cq{L >3). It is important to note that the compression 
improvements at stages L = 2 and L — 3 are significant. 
Therefore, a length-1 quantum representation misses the 
majority of the quantum advantage. 

To understand these results we need to sort out how 
quantum compression stems from noncounifilarity. In 
short, the latter leads to quantum signal states with 
nonzero overlap that allow for super-classical compres¬ 
sion. Let’s explain using the current example. There is 
one noncounifilar state in this process, state A. Both 
states A and G lead to H on a symbol 1. Due to this, at 
L = 1, the two q-machine states: 

\va) = Vp| 1) I^) + \/l -pjo) \B) and 
I^G) = |l)|dl) 

have a nonzero overlap of {rjAlriG) = y/P- (All other 
overlaps in the L = 1 q-machine vanish.) As with the 
Biased Goins Process, this leads to the inequality Cq{\) < 
C,. 

Extending the representation to L = 2 words, we find 
three nonorthogonal quantum states: 

= \/^|ll) \A) -k \/p{l-p) jlO) \B) 

+ 7(1^100) |C) , 

\vf) = 111) I A) , and 

\vg) = Vp|11) |A) -k yi -pjlO) \B) , 

with three nonzero overlaps {tjaIiIf) = P, = \/P^ 

and {■qF\'nG) = Vp- 

Note that the overlap {tiaIiIg) is unchanged. This is 
because the conditional futures are identical once the 
merger on symbol 1 has taken place. That is, the words 
11 and 10, which contribute to the L = 2 {r]A\r]G) over¬ 
lap, simply derive from the prefix 1, which was the source 
of the overlap at L = 1. In order to obtain a change in 
this or any other overlap, there must be a new merger- 
inducing prefix (for that state-pair). (See Sec. E for 
computational implications.) Since all quantum ampli¬ 
tudes are positive, each pairwise overlap is a nondecreas¬ 
ing function of L. 

At L = 2 we have two such new mergers: 11 for (?7 a|?7p’) 


and 11 for {rjF\'nG)- This additional increase in pairwise 
overlaps leads to a second decrease in the von Neumann 
entropy. (See Sec. C for details.) Then, at L = 3, we 
find three new mergers: 111 for {riA\'riE), HI for 
and 111 for (ryFl^yo)- As before, the pre-existing mergers 
simply acquire suffixes and do not change the degree of 
overlap. 

Importantly, we find that at L = 4 there are no new 
mergers. That is, any length-4 word that leads to the 
merging of two states must merge before the fourth sym¬ 
bol. In general, the length at which the last merger oc¬ 
curs is equivalent to the cryptic order [21]. Further, it 
is known that the von Neumann entropy is a function 
of pairwise overlaps of signal states [27]. Therefore, a 
lack of new mergers, and thus constant overlaps, implies 
that the von Neumann entropy is constant. This demon¬ 
strates that Cq{L) is constant for L > k, for k the cryptic 
order. 

The R-k Golden Mean Process was selected to high¬ 
light the unique role of the cryptic order, by drawing 
a distinction between it and Markov order. The result 
emphasizes the physical significance of the cryptic order. 
In the example, it is not until L = 4 that a naive ob¬ 
server can synchronize to the causal state; this is shown 
by the Markov order. For example, the word 000 induces 
two states D and E. Just one more symbol synchronizes 
to either E (on 0) or E (on 1). Yet recall that syn¬ 
chronization can come about in two ways. A word may 
either induce a path merger or a path termination. All 
merger-type synchronizations must occur no later than 
the last termination-type synchronization. This is equiv¬ 
alently stated: the cryptic order is never greater than the 
Markov order [21]. 

In the current example, we observe this termination- 
type of synchronization on the symbol following 000. For 
instance, 0000 does not lead to the merger of paths orig¬ 
inating in multiple states. Rather, it eliminates the pos¬ 
sibility that the original state might have been B. 

It is the final merger-type synchronization at L = 3 
that leads to the final unique-prefix quantum merger and, 
thus, to the ultimate minimization of the von Neumann 
entropy. So, we see that in the context of the q-machine, 
the most efficient state compression is accomplished at 
the process’s cryptic order. (One could certainly con¬ 
tinue beyond the cryptic order, but at best this increases 
implementation cost with no functional benefit.) 


3. Nemo Process 

To demonstrate the challenges in quantum compress¬ 
ing typical memoryful stochastic processes, we conclude 
our set of examples with the seemingly simple three- 
state Nemo Process, shown in Fig. 3(top). Despite its 
overt simplicity, both Markov and cryptic orders are in¬ 
finite. As one should now anticipate, each increase in 
the length L affords a smaller and smaller state entropy, 
yielding the infinite chain of inequalities: > Cq{l) > 
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FIG. 3. Nemo Process: (top) Its e-machine, (left) Sta¬ 
tistical complexity quantum state entropy Cq{L), and 
excess entropy E as a function of A’s self-loop probability 
p € [0,1]. Cq{L) is calculated and plotted (light to dark 
blue) for L = 0,(right) For p = 0.666, Cq{L) de¬ 
creases monotonically, never reaching the limit since the pro¬ 
cess’ cryptic order is infinite. The full quantum advantage is 
realized only in the limit. 


Cq{2) > Cq{2>) > ... > Cq{<X)). Figure 3(right) veri¬ 
fies this. This sequence approaches the asymptotic value 
Cq{oo) ~ 1.0332. We also notice that the convergence of 
Cq{L) is richer than in the previous processes. For ex¬ 
ample, while the sequence monotonically decreases (and 
at each p), it is not convex in L. For instance, the fourth 
quantum incremental improvement is greater than the 
third. 

We now turn to discuss the broader theory that under¬ 
lies the preceding analyses. We first address the conver¬ 
gence properties of Cq{L), then the importance of study¬ 
ing the full range of memoryful stochastic processes, and 
finally tradeoffs between synchronization, compression, 
and prediction. 


C. Cq{L) Monotonicity 

It is important to point out that while we observed 
nonincreasing Cq{L) in our examples, this does not con¬ 
stitute proof. The latter is nontrivial since Ref. [27] 
showed that each pairwise overlap of signal states can in¬ 
crease while also increasing von Neumann entropy. (This 
assumes a constant distribntion over signal states.) Fur¬ 
thermore, this phenomenon occurs with nonzero mea¬ 
sure. They also provided a criterion that can exclude 
this somewhat nonintuitive behavior. Specifically, if the 
element-wise ratio matrix R of two Gram matrices of sig¬ 
nal states is a positive operator, then strictly increasing 
overlaps imply a decreasing von Neumann entropy. We 
note, however, that there exist processes with e-machines 


FIG. 4. Distribution of Markov order R and cryptic or¬ 
der for all 1,132,613 six-state, binary-alphabet, exactly- 
synchronizing e-machines. Marker size is proportional to the 
number of e-machines within this class at the same {R, k^). 
(Reprinted with permission from Ref. [29].) 


for which the R matrix is nonpositive. At the same time, 
we have found no example of an increasing Cq{L). 

So, while it appears that a new criterion is required to 
settle this issue, the preponderance of numerical evidence 
suggests that Cq{L) is indeed monotonically decreasing. 
In particular, we verified Cq{L) monotonicity for many 
processes drawn from the topological e-machine library 
[28]. Examining 1000 random samples of two-symbol, 
A^-state processes for 2 < < 7 yielded no counterex¬ 

amples. Thus, failing a proof, the survey suggests that 
this is the dominant behavior. 


D. Infinite Cryptic Order Dominates 

The Biased Coins Process, being cryptic order fc = 1, is 
atypical. Previous exhaustive surveys demonstrated the 
ubiquity of infinite Markov and cryptic orders within pro¬ 
cess space. For example. Fig. 4 shows the distribution of 
different Markov and cryptic orders for processes gener¬ 
ated by six-state, binary-alphabet, exactly-synchronizing 
e-machines [29]. The overwhelming majority have infinite 
Markov and cryptic orders. Furthermore, among those 
with finite cryptic order, orders zero and one are not 
common. Such surveys in combination with the apparent 
monotonic decrease of Cq{L) confirm that, when it comes 
to general claims about compressibility and complexity, 
it is advantageous to extend analyses to long sequence 
lengths. 
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FIG. 5. Trading prediction for quantum compression: A is 
Alice’s state of predictive knowledge. 13 is that for Bob, except 
when he uses the process’ e-machine to refine it. In which case, 
his predictive knowledge becomes that in S', which can occnr 
at a time no earlier than that determined by the cryptic order 
k. 


E. Prediction-Compression Trade Off 

Let’s return to Alice and Bob in their attempt to syn¬ 
chronize on a given stochastic process to explore some¬ 
what subtle trade-offs in compressibility, prediction, and 
complexity. Figure 5 illustrates the difference in their 
ability to generate probabilistic predictions about the fu¬ 
ture given the historical data. There, Alice is in causal 
state A (signified by A for Alice). Her prediction “cone” 
is depicted in light gray. It depicts the span over which 
she can generate probabilistic predictions conditioned on 
the current causal state (A). She chooses to map this 
classical causal state to a L = 3 q-machine state and 
send it to Bob. (Whether this is part of an ensemble of 
other such states or not affects the rate of qubit trans¬ 
mission, but not the following argument.) It is impor¬ 
tant to understand that Bob cannot actually determine 
the corresponding causal state (at time t = 0). He can, 
however, make a measurement that results in some sym¬ 
bol sequence of length 3 followed by a definite (classical) 
causal state. In the figure, he generates the sequence 
III followed by causal state A at time t = 3. This is 
shown by the blue state-path ending in B for Bob. Now 
Bob is in position to generate corresponding conditional 
predictions—H’s future cone Pr(Alo:oo|^)- As the figure 
shows, this cone is only a subprediction of Alice’s. That 
is, it is equivalent to Alice’s prediction conditioned on 
her observation of 111 or any other word leading to the 
same state. 

Now, what can Bob say about times t = 0,1, 2? The 
light blue states and edges in the figure show the alternate 
paths that could have also lead to his measurement of the 
sequence 111 and state A. For instance. Bob can only say 
that Alice might have been in causal states A, D, or E at 
time t = 0. In short, the quantum representation led to 
his uncertainty about the initial state sequence and, in 
particular, Alice’s prediction. All together, we see that 


the quantum representation gains compressibility at the 
expense of Bob’s predictive power. 

What if Alice does not bother to compute k and, want¬ 
ing to make good use of quantum compressibility, uses an 
L = 1000 q-machine? Does this necessarily translate into 
Bob’s uncertainty in the first 1000 states and, therefore, 
only a highly conditional prediction? In our example, 
Alice was not quite so enthusiastic and settled for the 
L = 3 q-machine. We see that Bob can use his current 
state A ai t = 3 and knowledge of the word that led to 
it to infer that the state at t = 2 must have been A. 
The figure denotes his knowledge of this state by B'. For 
other words he may be able to trace farther back. (For 
instance, 000 can be traced back from D at t = 3 all the 
way to A at t = 0.) The situation chosen in the figure 
illustrates the worst-case scenario for this process where 
he is able to trace back and discover all but the first 2 
states. The worst-case scenario defines the cryptic or¬ 
der k, in this case k = 2. After this tracing back, Bob 
is then able to make the improved statement, “If Alice 
observes symbols II, then her conditional prediction will 
be Pr(Xo:oo|A)”. This means that Alice and Bob cannot 
suffer through overcoding —using an L in excess of k. 

Finally, one feature that is unaffected by such manipu¬ 
lations is the ability of Alice and Bob to generate a single 
future instance drawn from the distribution Pr(Alo:oo|A). 
This helps to emphasize that generation is distinct from 
prediction. Note that this is true for the q-machine M(L) 
at any length. 


METHODS 

Let’s explain computing Cq{L). First, note that the 
size of the q-machine M{L) Hilbert space grows as 
[L'^A\ for the density operators). That is, computing 
Cq{L = 20) for the Nemo Process involves finding eigen¬ 
values of a matrix with 10^^ elements. Granted, these 
matrices are often sparse, but the number of compo¬ 
nents in each signal state still grows exponentially with 
the topological entropy rate of the process. This alone 
would drive computations for even moderately complex 
processes (described by moderate-sized e-machines) be¬ 
yond the access of contemporary computers. 

Recall though that there are, at any L, still only jS”! 
quantum signal states to consider. Therefore, the embed¬ 
ding of this constant-sized subspace wastes an exponen¬ 
tial amount of the embedding space. We desire a com¬ 
putation of Cq{L) that is independent of the diverging 
embedding dimension. 

Another source of difficulty is the exponentially in¬ 
creasing number of words with L. However, we only need 
to consider a small subset of these words. Once a merger 
has occurred between states \rii) and \r]j) on word w, sub¬ 
sequent symbols, while maintaining that merger, do not 
add to the corresponding overlap. That is, the contribu¬ 
tion to the overlap {rjilrjj) by all words with prefix w is 
complete. 
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FIG. 6. Pairwise-merger machines for our three example 
processes. Pair-states (red) lead to each other or enter the 
e-machine at a noncounifilar state. For example, in the R-k 
Golden Mean (middle), the two pair-states AF and FG both 
lead to pair-state AG on 0. Then pair-state AG leads to state 
A, the only noncounifilar state in this e-machine. 


To take advantage of these two opportunities for re¬ 
duction, we compute Cq{L) in the following manner. 

First, we construct the “pairwise-merger machine” 
(PMM) from the e-machine. The states of the PMM 
are unordered pairs of causal states. A pair-state (cr^, Oj) 
leads to {<Jm,c!n) on symbol x if cfi leads to Um on x 
and Gj leads to Um on x. (Pairs are unordered, so we 
consider m -o- n as well.) If both components in a pair- 
state lead to the same causal state, then this represents a 
merger. Of course, these mergers from pair-states occur 
only when entering noncounifilar states of the e-machine. 
If either component state forbids subsequent emission of 
symbol a:, then that edge is omitted. The PMMs for the 
three example processes are shown in Fig. 6. 

Now, making use of the PMM, we begin at each non¬ 
counifilar state and proceed backward through the pair- 
state transient structure. At each horizon-length, we 
record the pair-states visited and with what probabilities. 


This allows computing each increment to each overlap. 
Importantly, by moving up the transient structure, we 
avoid keeping track of any further novel overlaps; they are 
all “behind us”. Additionally, the finite number of pair- 
states gives us a finite structure through which to move; 
when the end of a branch is reached, its contributions 
cease. It is worth noting that this pair-state transient 
structure may contain cycles (as it does for the Nemo 
Process). In that case, the algorithm is non-halting, but 
it is clear that contributions generated within a cycle de¬ 
crease exponentially. 

All of this serves to yield the set of overlaps at each 
length. We then use a Gram-Schmidt-like procedure to 
produce a set of |5| vectors in (the positive hype¬ 

roctant) having the desired set of overlaps. 

Weighting these real, positive vectors with the station¬ 
ary distribution yields a real, positive-element represen¬ 
tation of the density operator restricted to the subspace 
spanned by the signal states. At this point, computing 
Cq{L) reduces to finding eigenvalues of an |5| x |5| ma¬ 
trix. 

For example, consider the Nemo Process. 
The sequence of overlap increments for L = 
[0,1, 2, 3,4, 5,6, 7, 8,...], for (77o|r?i), (??i|? 72 ), (mlm) 
respectively, is given by: 


\/p(l -P) 


X [0, 0, 0, a^, a^, a^, ,a^,...] 



X [0, 0, a°, a°, a°, a^, a^, a^, a^,...] , and 
X [0, a°, o°, a°, o^, a^, a^, a^, a^,...] , 


where a = (1 —p)j2. 

And, the asymptotic cumulative overlaps are given by: 


ivolvi) 

{m\v2) 

imlvo) 


Vpj^-p) 

i+P 


Vp 

i+P 


and 


l+p ' 


From this, we computed the restricted density matrix 
and, hence, its L —)■ oo entropy Cq{oo) ~ 1.0332, as illus¬ 
trated in Fig. 3. The density matrix and eigenvalue forms 
are long and not particularly illuminating, and so we do 
not quote them here. A sequel details a yet more effi¬ 
cient analytic technique based on holomorphic functions 
of the internal-state Markov chain of a related quantum 
transient structure. 


DISCUSSION 

Recalling our original motivation, we return to the con¬ 
cept of pattern', in particular, its representation and char- 
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acterization. We showed that, to stand as a canonical 
form, a process’ quantum representation should encode, 
explicitly in its states, process correlations over a suffi¬ 
ciently long horizon-length. In demonstrating this, our 
examples and analyses found that the q-machine gener¬ 
ally offers a more efficient quantum representation than 
the alternative previously introduced [1]. 

Interestingly, the length scale at which our construc¬ 
tion’s compression saturates is the cryptic order, a re¬ 
cently introduced measure of causal-state merging and 
synchronization for classical stochastic processes. Cryp¬ 
tic order, in contrast to counifilarity, makes a finer divi¬ 
sion of process space, suggesting that it is a more appro¬ 
priate explanation for super-classical compression. We 
also developed efficient algorithms to compute this ulti¬ 
mate quantum compressibility. Their computational effi¬ 
ciency is especially important for large or infinite cryptic 
orders, which are known to dominate process space. 

We cannot yet establish the minimality of our con¬ 
struction with respect to all alternatives. For example, 
more general quantum hidden Markov models (QHMMs) 
may yield a greater advantage [3]. Proving minimality 
among QHMMs is of great interest on its own, too, as 
it should lead to a canonical quantum representation of 
classical stochastic processes. States in such QHMMs 
might then be appropriately named “quantum causal 
states”. 

And, what is the meaning of the remaining gap be¬ 
tween Cq{k) and E? In the case that Cq{k) is in fact 


a minimum, this difference should represent a quantum 
analog of the classical crypticity. Physically, since the 
latter is connected with information thermodynamic effi¬ 
ciency [30], it would then control the efficiency for quan¬ 
tum thermodynamic processes. 

Let’s close by outlining future impacts of these re¬ 
sults. Most generally, they provide yet another moti¬ 
vation to move into the quantum domain, beyond crack¬ 
ing secure codes [31] and efficient database queries [32]. 
They promise extremely high, super-classical compres¬ 
sion of our data. If implementations prove out, they will 
be valuable for improving communication technologies. 
However, they will also impact quantum computing it¬ 
self, especially for Big Data applications, as markedly 
less information will have to be moved when it is quan¬ 
tum compressed. 
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